

# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal) Website: <u>www.ijareeie.com</u> Vol. 9, Issue 2, February 2020

# Programmable FFT Processor for 4G, Wireless LAN and Future 5G Communications

Ch.Hemalakshmi<sup>1</sup>, Dr.Shaik Mastan Vali<sup>2</sup>

M.Tech Student, Department of ECE, MVGR College of Engineering, Vizianagaram, Andhra Pradesh, India<sup>1</sup>

Professor, Department of ECE, MVGR College of Engineering, Vizianagaram, Andhra Pradesh, India<sup>2</sup>

**ABSTRACT**: (FFT) The high flexible programmable fast Fourier transform processor is implemented and supporting 16- to 4096-point FFT's and 12- to 2400-point discrete Fourier transforms(DFT's) for 4G, wireless local area network, and future 5G.A 16-path data parallel memory-based architecture is selected as a tradeoff between throughput and cost. And design an -efficient high-speed processor hardware, several improvements are provided. Twiddle factor multipliers using different schemes are optimized and compared, wherein modified coordinate rotation digital computer scheme is finally implemented to minimize the hardware cost while supporting both FFTs and DFTs. The processor is designed as a general IP and can be implemented using a processor synthesizer The processor design automation with optimized result parameters by using 65-nm technology the area of the processor is 1.46 mm2. The processor supports972 MS/s 4096-point FFT at 250 MHz with a power consumption of 49.33 mW and a signal-to-quantization-noise ratio of 42.14 dB .The proposed processor has better-normalized throughput per area unit than the state-of-the-art available designs.

**KEYWORDS**: (OFDM)Orthogonal Frequency Division Multiplexing, (FFT) Fast Fourier Transform, (EDA)Electronic Design Automation.

### **I.INTRODUCTION**

The implementation of FFT processor for OFDMA system enables us to study the MCM (Multi Carrier Modulation) techniques by integrating an FFT processor with the existing OFDMA system Fast Fourier transform (FFT) is a compute-intensive algorithm in the physical layer of an orthogonal frequency division multiplexing (OFDM) system to convert data between time domain and frequency domain . Many OFDM systems such as 4G LTE/LTE-A [1] and wireless local area network (WLAN) [2], [3] require power-of-two FFTs. LTE uplink preceding requires non power-oftwo discrete Fourier transforms(DFTs) from 12 to 2400. In the upcoming 5G [4] (the fifth generation mobile communication), FFT is still an essential algorithm for all of the waveform candidates, and the FFT computation speed should be high enough to support the high FPGA implementations to make use of these modern technologies of data transmission, it is necessary to develop the efficient techniques of digital modulation. In this paper, the contributions are to propose a memory-based FFT processor supporting 54 modes including16-4096 point FFTs and 12-2400 point DFTs for 4G, WLAN, and future 5G and propose improvements on critical parts in butterfly unit, TF multiplier, and data access scheme.. "The FFT has been called the most important numerical algorithm of our lifetime. The FFT is a fast algorithm for computing the DFT, If we take the 2-point DFT and 4-point DFT and generalize them to 8-point, 16 point,4096 The FFT algorithm. There are many distinct FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic to group theory and number theory. A DFT decomposes a sequence of values into components of different frequencies. This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing a DFT of N points in the naive way, using the definition, takes O(N2) arithmetical operations, while an FFT can compute the



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

#### Website: <u>www.ijareeie.com</u>

#### Vol. 9, Issue 2, February 2020

same result in only O(N log N) operations. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or millions, in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N / log (N)[5].

#### II. AN OVERVIEW OF MEMORYBASED FFT PROCESSOR

A Programmable Fast Fourier Transform (FFT) Processor, essentially identical to coded OFDM (COFDM) and discrete multi-tone modulation (DMT High quadratic-amplitude modulation requires high data precision. The requirement on the maximum throughput can be calculated by the maximum bandwidth multiplying the number of streams that one processor handles. In this paper, we assume that one FFT processor supports at most 8-stream4096-point FFT computations[6]. The requirement on throughput(RqOnThrpt) is computed as follows, where the cyclic prefix samples are not removed. Therefore, the RqOnThrpt is overestimated. The hardware design based on this requirement can provide sufficient performance for 5G. OFDM(Orthogonal frequency –division multiplexing) has developed into a popular scheme for wideband digital communication, whether Wireless or over copper wires, used in applications such as digital television and audio broadcasting, wireless networking and broadband internet access. The system maps the input bits into complex-valued symbols X(n) in the modulation block, which determines the constellation scheme of each subcarrier. The number of bits assigned to each subcarrier is based on the signal to noise ratio of each subcarrier on the frequency range The adaptive bit loading algorithm will be detailed below.

Compared with pipelined butterfly unit [5], [8], general radix-r butterfly unit is easier to support additional 3- and5radix by reusing the adders and multipliers. Here fore ,in this design, the reference architecture is set as memory based architecture with general radix-r butterfly unit(s). The proposed processor is optimized based on the reference architecture[7]. For an FFT processor that has k radix-r butterfly units in Fig. 1, an N-point FFT can be decomposed into several cascaded radix-r stages. The number of computation stages is:

RqOnThrpt=8\*122.88MHz=983.04MS/s

$$RqOnLtcy = \frac{8 \times 4096}{RqOnThrpt}$$

=33.34us  $\leq 4$ ms

No Stage = 
$$[\log_2 N / \log_2 r]$$

(3)

$$Clks_{stage} = N / (k \times r)$$

### (4)

 $Clks_{stage} \times No.stage + ClkS_{Overhead} \leq (N / RqOnThrpt) \times R$ 

In general, the data sequence x(n) is also assumed to be complex valued. Similarly, The IDFT becomes

$$x(n) = \frac{1}{N} \sum_{n=0}^{N-1} X(k) W_N^{-nk}, \qquad 0 \le n \le N - 1$$
-(6)

An FFT is any method to compute the same results in  $O(N \log N)$  operations. More precisely, all known FFT algorithms require  $\Theta(N \log N)$  operations (technically, *O* only denotes an upper bound), although there is no known proof that better complexity is impossible.

99

-(1)

(2)

-(5)



(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: www.ijareeie.com

### Vol. 9, Issue 2, February 2020

# TABLE: 1 Hardware cost of Generation of Architecture

|   | Continuous flow | r  | k | R2 | Mul   | Mem             | Cost |
|---|-----------------|----|---|----|-------|-----------------|------|
| 1 | v               | 8  | 2 | 24 | 4+14  | $3 \times 4096$ | 2640 |
| 2 | 1               | 16 | 1 | 32 | 8+15  | 3 × 4050        | 2970 |
| 3 |                 | 8  | 4 | 48 | 8+28  |                 | 3280 |
| 4 | N               | 16 | 1 | 32 | 8+15  | 2	imes 4096     | 2470 |
| 5 |                 | 32 | 1 | 80 | 28+31 |                 | 4750 |

1)Continuous-Flow Processor: For continuous-flow FFT Processor that requires no I/O cycles, the memory size is  $3 \times N$  (single port), which corresponds to three memory groups in Fig. 1. Using N = 4096 and (1), (5) can be simplified as follows:

| 4096                    | $\log_2 4096$ | $+0 \le 4096 \times 250$ |
|-------------------------|---------------|--------------------------|
| $\overline{k \times r}$ | $\log_2 r$    | 983.04MS / s             |

(7)

2) Non continuous-Flow Processor: For non continuous-flow FFT processors, an extra I/O stage is needed between two adjacent FFTs for reading input of the next FFT and writing result of the previous FFT. The memory size is  $2 \times N$  (single port), which corresponds to two memory groups in Fig. 1.Using N = 4096 and (1), (5) can be simplified as follows:

| 4096 | $\log_2 4096$ | 4096 | 4096×250             |
|------|---------------|------|----------------------|
|      |               |      | 983.04 <i>MS / s</i> |

(8)

#### **III.MIXED-RADIX ALGORITHM**

The DFT is defined as

$$\begin{cases} X[K] = \sum_{n=0}^{N-1} x[n] W_N^{NK} \\ W_N^{nk} = \exp\left(-j\frac{2\pi nk}{N}\right) \end{cases}$$

Where x[n] and X[k] are the input and output sequences, and the DFT size is N. If N is not a prime number, the original DFT can be decomposed into cascaded smaller



(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: www.ijareeie.com

# Vol. 9, Issue 2, February 2020



Fig. 2. Structure of the previously proposed FFT processor

FFT size radices using the mixed-radix algorithm. Suppose that an *N*-point DFT can be decomposed into *S*-cascaded small-size DFT stages as shown in the following equation.

$$\begin{cases} N = \prod_{j=1}^{s} N_{j} \\ n = \sum_{m=1}^{s-1} \left( \prod_{n=m+1}^{s} (N_{n}) n_{m} \right) + ns \\ k = k_{1} \sum_{m=2}^{s} \left( \prod_{n=1}^{m-1} (N_{n}) k_{m} \right) \end{cases}$$

#### B. The overall design of the FFT processor Architecture

The FFT module is the core part in the OFDMA system, IEEE Std 802.16-2005 defined clearly[8]: the core module of OFDMA physical layer is the FFT module, which can be used in the FFT points are: 2048 points (1024 points, 512 points and 128 points. The design about variable point FFT processor is just based on FFT module in OFDMA system application. According to the idea of two-dimensional Fourier algorithm, i.e. N = 128, NJ = 2, N2 = 64. From 128 = 2 \* 64, we find that when achieve 128-point FFT.

Firstly the data is arranged in 64 lines and 2 rows, Secondly the input data will transform the 64 points FFT, then the result multiplies twiddle factor, Thirdly, let the result do 2 points FFT. Similarly, Calculation of the 512-point FFT firstly do the 64 points FFT, then transform further 8 points FFT; The same way to calculate the 1024 and 2048 points FFT. Block diagram of the overall design of the FFT processor as shown in Fig.2. & In Fig.3. We see that the architecture of the overall design described by Verilog language.

#### 1) 64-point FFT module

This module is the most frequently used in the design. Four kinds of input data length all must first pass through the 64 points FFT module. The same idea in the module based on 2D Fourier transform algorithm is composed of two 8-point FFT modules. So 8-point FFT module is the kernel in this part, its performance affects the whole design.

#### 2) Select and control module

This part is the kernel to complete the alterable data length in the design. It is based on the input data points to select the results stored in the middle data memory, and then choose the way to next flow. A two bits signal 'mode' is chosen as



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

#### Website: www.ijareeie.com

### Vol. 9, Issue 2, February 2020

the mode signal. When mode = 00, means to choose 2 points FFT module, to complete the 128-point FFT; when mode = 01, means to choose 8 FFT module, to complete the 512-point FFT; Equally: When the mode = 10, means to complete 1024 point FFT; when mode = 11, means to complete the 2048 point FFT.

#### A. Butterfly Unit

To support different radices, multiple butterfly units have been proposed. A unified butterfly unit in [2] supports radix-2, -3, -4, -5, and -7 butterfly operations by reusing the hardware adders and multipliers. The enhanced delay element matrix unit in [7] supports radix-2, -3, -4, -5, -8,-9, -16, and -25 butterfly operations by using the 2-D DFT factorization method. The high radix small butterfly (HRSB) unit in [18] supports radix-2, -3, -4, -5, -8, -9, -12, -15, -16, and -25 butterfly operations by using a two-stage multipath delay .



It is important to notice that in that mapping it is just made a conversion of bits for the faster that acts, however it is not made any modulation, as in the case of QAM, because that as shown, it is done by IFFT. It is necessary to specify how the constellation will be to be mapped, to implement that block. However, independently of the format of the constellation, the block encoder can be made through a consultation at a conversion table, implemented by LUT that exists in LCs of FPGAs. For instance, for a 4-QAM constellation in such a way that a and b are binary numbers of 3 bits, and are converted to complement two.

#### *3)* Decoder of the constellation

In the receiver, the point of the constellation transmitted it can have changed due to the noises of the transmission channel, mistake in the time of sampling of the receiver and several other causes. Therefore it is necessary to define a threshold so that it can be made the decision on which point in the constellation the received sign is acting. That is the function of the decoder. For the system exemplified above the bit 0 is converted for 010b and the bit 1 for 110b[9]. In that case, the decoder is implemented in a simple way, sticks to the most significant (that indicates the sign) bit to do the decoding, and generating a binary number of m bits again. For systems in that the constellation diagram is larger than 4-Thus the four N/4-point DFTs F(l, q)obtained from[10],[9]. The expression for combining the N/4-point DFTs defines a radix-4 decimation-in-time butterfly, which can be expressed in matrix form asNote that each butterfly involves three complex multiplications, since WN0 = 1, and 12 complex additions.

#### A. Radix-4 FFT Algorithm

When the number of data points N in the DFT is a power of 4 (i.e., N = 4v), we can, of course, always use a radix-2 algorithm for the computation. However, for this case, it is more efficient computationally to employ a radix-r FFT algorithm.

Let us begin by describing a radix-4 decimation-in-time FFT algorithm briefly. We split or decimate the N-point input sequence into four subsequences, x(4n), x(4n+1), x(4n+2), x(4n+3), n = 0, 1, ..., N/4-1.



(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: www.ijareeie.com

### Vol. 9, Issue 2, February 2020

### IV. THE PROPOSED ARCHITECTURE



Fig.4. Architecture of Proposed FFT using OFDM.

Fig.4. shows that the signal is transmitted from series to parallel and again parallel to series with n bits as input signals. and QAM is signal is given to FFT processor in the base it is given for 8bit and her we designed it for 16 bit and using OFDM and finally the c(t) signal is given to the channel. here the time consumption is less compared to the previous models and the are will less because it takes only one path to store two types of data.

### **IV.LOW POWER CONSUMPTION CALCULATIONS**

An architecture FFT Processor is synthesized using smaller technology, the result is, obviously, also smaller. To compensate for the differences in technology, the Normalized Area is calculated. This equation, presented in normalizes the area to the smallest technology in the comparison, in this study the smallest technology is 65nm

*Normalized Area* = 
$$\frac{Area}{(Techno \log y / 65nm)^2}$$

Eq: 1

The power consumption per butterfly, the smaller length FFTs are still somewhat favored, the power consumption per butterfly can put this in perspective. It is also an indication of how much more power the high speed cores use compared to the low speed cores.

$$PowerperButterflyoperation(P / B) = \frac{Power}{Number of butterfiles}$$

Eq: 2

Power and Energy

Energy  $\times$  time the value of considering a merit function which incorporates energy-efficiency and performance A popular metric which does this is energy  $\times$  time (or E $\times$ T), where time is the same as delay and is proportional to 1/frequency. measured E  $\times$  T versus supply voltage. For Spiffee1, E  $\times$  T per 1024-point complex FFT is,



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: www.ijareeie.com

### Vol. 9, Issue 2, February 2020

#### V.LOW POWER IMPLEMENTATION APPROACHES

Power dissipation in a CMOS transistor depends on the capacitance, supply voltage and the rate at which the data toggles.

 $P=f * C_{load} VDD^2$ Where, Cload is the load capacitance of the CMOS transistor VDD is the supply voltage f is the frequency at which the data transition happens Ex: 1

Sqnr Calculations as per data sheet rule of thumb, that each bit more gives you about 6 dB more SNR. Apply A/2 rules

$$SNR_{db} = 10*10\log_{10}\left(\frac{2^{2N}}{c^2}\right)$$

(10) SNR=3.01\*14=42.14 dB

#### VI. SYNTHESIS RESULTS

In this section synthesis results corresponding to the processor are detailed. Beyond that point, the standard hardware synthesis flow is adopted in generating a net list, and estimating parameters such as gate count, area, power dissipation, etc. Synthesis results were generated using Xilinx, a powerful synthesis tool Xilinx provides details about the area occupied by the different blocks. The technology setting was for 130nm CMOS. The value of Clock Speed (C.S.) used is 100 MHz, the area corresponding to logic blocks comes to 2.03mm<sup>2</sup>. There would be a fractional increase of approximately 20% after routing, giving an area of 2.44mm<sup>2</sup> As can also be noted, the dominant area is by the memories, here chosen to allow the processor capability to scale up to an order of a 1024-point FFT.An advantage of the twiddle factor driven approach is that, when not required, the twiddle factor memory can be turned off. This advantage becomes more prominent towards the later stages of the FFT, as the number of butterflies that make use of the same twiddle factor set increases.

| data_img(31:0)  | data_out_img(31:0)  |  |
|-----------------|---------------------|--|
| data_real(31:0) |                     |  |
| clk             | data_out_real(31:0) |  |
| data_rdy        |                     |  |
| rst             | out_rdy             |  |
|                 |                     |  |

Fig.5. Programmable FFT processor synthesized RTL Schematic.



(A High Impact Factor, Monthly, Peer Reviewed Journal) Website: <u>www.ijareeie.com</u>

# Vol. 9, Issue 2, February 2020



Fig.6.Internal; structure of an PFFT processor

The synthesized FFT processor chip which is obtained after simulation and synthesizing the High Speed FFT processor for OFDMA System Using FPGA.



# VII.SIMULATION RESULTS

Fig.7. FFT Processor simulation output



(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijareeie.com</u>

# Vol. 9, Issue 2, February 2020



Fig.8 FFT Processor power report(49.33).

Data flow structures \_1



Fig.9. FFT Processor Memory unit Dataflow output

### Data flow structures \_2



Fig.10.FFT Processor Dataflow output



(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: www.ijareeie.com

# Vol. 9, Issue 2, February 2020



Fig.11.Delay report of FFT processor



.Fig.12.Area report of FFT processor

### IX. HARDWARE IMPLEMENTATION

The synthesis process would also produce a bit stream file that can be downloaded in the FPGA board. The bit stream file of the Programmable Fast Fourier Transform has been successfully downloaded to diligent FPGA family of board after installing necessary drivers on PC. The test operation the physical functionality of the PFFT has been done by simply interfacing a function generator to apply input data and oscilloscope to monitor the recovered data the board nexys4DDR.

#### X.IMPLEMENTATION RESULTS

After compiling the VERILOG code busing Xilinx14.3 and downloading the bit streams successfully to nexys4DDR kit, TTL data from function generator of rate 500 KHz has been applied to the kit while the output has been measured by an oscilloscope. Output data when the input to the control circuit is logic 1.The power consumption reduced more than 30% 49.33mW.sqnr 42.14dB.as per existing system the tables extract from Xilinx power analysis menu

#### **XI.CONCLUSION**

In keeping with the design choices enumerated in this section, the FFT processor description is carried out in Xilinx, and the corresponding application kernel is written in Verilog language. The design uses a variable point FFT processor



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

#### Website: www.ijareeie.com

### Vol. 9, Issue 2, February 2020

designed using FPGA and was applicable to memory-based FFT processor is proposed to support 12- to 2400-point DFTs and 16- to 4096-point FFTs for 4G, WLAN, and future 5G.By reusing the hardware resources, a system The processor design uses Verilog language to describe the circuit, used Xilinx software to build the model and to verify the Timing diagram verify on Mentor Graphics (Model Sim).

A FFT algorithm is chosen in comparison to a DFT algorithm owing to its suitability towards the twiddle factor driven approach.on 16-bit, and limited to 100MHz clock Frequency, the overall timing interval design with stability condition FFT processor processing with help of inside programmable Memory This design can be applied to Real time processing system, completes the main computing modules in the Programmable FFT Processor.

#### REFERENCES

[1] IEEE Std 802.16e<sup>TM</sup>-200S and IEEE Std 802.16<sup>TM</sup>- 2004/Corl- 200S.

[2] WiMAX Forum, Krishna Ramadas and Raj Jain, "WiMAX System Evaluation Methodology" version 2.1 July 2008.

[3] 3GPP TR 25.814 v.I.S, Physical Layer Aspects for Evolved UTRA(release 7), May 2006.

[4] Multiband OFDM Alliance.

[5] Multiband OFDM Physical layer Proposal for IEEE 802.15 Task group 3a IEEE P802.15-03/141r1, March 2004.

[6] Multiband OFDM Physical layer Proposal Update IEEE P802.15-04/0220r1, May 2004.

[7] Multi-band OFDM: a new approach for UWB Batra, A.; Balakrishnan, J.; Dabak, A.; Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on Volume 5, 23-26 May 2004 Page(s):V-365 - V-368 Vol.5.

[8] Zhang Zhu-jun, " Optimum Design of FFT Processor with Cascade Structure Based on FPGA," J. Nanjing University of Science and Technology. 2009. in press.

[9] Xie Yan-Iin. "Design and Implementation of a Scalable Pipeline FFT Processor Based on FPGA" D. Xi Dian University.2007, in press.

[10] An Algorithm for Machine computation of Complex Fourier Series

### BIOGRAPHY



Dr.S.M.Vali received his Ph.D from Andhra University in 2013.He is working as a Professor in the Department of Electronics and Communication Engineering at MVGR College of Engineering(A).His research interests include Antennas, Slotted Waveguide junctions VLSI and signal processing.



CH.Hema Lakshmi oursuedB.Tech from Miracle college of Engineering, Bhogapuram, Vizianagaram in 2017.Presently Pursuing M.Tech(VLSI System Design) from MVGR College of Engineering.